restrict rest workers based on open files limit#4108
Conversation
There was a problem hiding this comment.
Pull request overview
Updates OVMS REST worker configuration to account for Linux open-files limits, preventing overly large rest_workers values and adding regression tests.
Changes:
- Compute a Linux-specific default for
rest_workersbased on RLIMIT_NOFILE and add a validation cap tied to open-files availability. - Add death tests validating behavior when the open-files limit is low.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 6 comments.
| File | Description |
|---|---|
src/config.cpp |
Adds Linux open-files-based default/validation logic for rest_workers. |
src/test/ovmsconfig_test.cpp |
Adds tests that manipulate RLIMIT_NOFILE to validate the new rest_workers constraints. |
src/config.cpp
Outdated
| // We need to minimize the number of default drogon workers since this value is set for both: unary and streaming (making it always double) | ||
| // on linux, restrict also based on the max allowed number of open files | ||
| #ifdef __linux__ | ||
| #include <sys/resource.h> | ||
| const uint64_t MAX_OPEN_FILES = []() { |
There was a problem hiding this comment.
#include <sys/resource.h> is placed inside namespace ovms (and mid-file). Including system headers inside a namespace can unintentionally put all their declarations/types into that namespace and create subtle lookup/type issues. Move this include up with the other includes at file scope (outside the namespace) and keep only the constants inside ovms.
src/config.cpp
Outdated
| if (restWorkers() > (MAX_OPEN_FILES - RESERVED_OPEN_FILES) / 5) { | ||
| std::cerr << "rest_workers count cannot be larger than " << (MAX_OPEN_FILES - RESERVED_OPEN_FILES) / 5 << " due to open files limit. Current open files limit: " << MAX_OPEN_FILES << std::endl; | ||
| return false; | ||
| } |
There was a problem hiding this comment.
On Linux, this check subtracts RESERVED_OPEN_FILES from MAX_OPEN_FILES without guarding MAX_OPEN_FILES <= RESERVED_OPEN_FILES. With unsigned types this can underflow and effectively disable the limit check. Add a guard (or use a saturating subtraction) before computing (MAX_OPEN_FILES - RESERVED_OPEN_FILES) / 5, and consider emitting a clear error when the open-files limit is too low to run REST workers at all.
| if (restWorkers() > (MAX_OPEN_FILES - RESERVED_OPEN_FILES) / 5) { | |
| std::cerr << "rest_workers count cannot be larger than " << (MAX_OPEN_FILES - RESERVED_OPEN_FILES) / 5 << " due to open files limit. Current open files limit: " << MAX_OPEN_FILES << std::endl; | |
| return false; | |
| } | |
| if (MAX_OPEN_FILES <= RESERVED_OPEN_FILES) { | |
| std::cerr << "Open files limit (" << MAX_OPEN_FILES | |
| << ") is too low to run REST workers. Increase the RLIMIT_NOFILE soft limit " | |
| << "or adjust the server configuration." << std::endl; | |
| return false; | |
| } | |
| const uint64_t maxRestWorkersByFiles = (MAX_OPEN_FILES - RESERVED_OPEN_FILES) / 5; | |
| if (restWorkers() > maxRestWorkersByFiles) { | |
| std::cerr << "rest_workers count cannot be larger than " << maxRestWorkersByFiles | |
| << " due to open files limit. Current open files limit: " << MAX_OPEN_FILES << std::endl; | |
| return false; | |
| } |
src/test/ovmsconfig_test.cpp
Outdated
| #include <regex> | ||
| #include <sys/resource.h> |
There was a problem hiding this comment.
<sys/resource.h> (and the new getrlimit/setrlimit-based tests) are Linux-specific, but the include is unconditional. This will break builds on non-Linux platforms. Wrap the include and these two new tests in #ifdef __linux__ (or equivalent platform guard).
| #include <regex> | |
| #include <sys/resource.h> | |
| #include <regex> | |
| #ifdef __linux__ | |
| #include <sys/resource.h> | |
| #endif |
| TEST_F(OvmsConfigDeathTest, restWorkersDefaultReducedForOpenFilesLimit) { | ||
| // limit allowed number of open files to 1024 to make sure that rest_workers count is too large for the limit based on number of cpu cores alone | ||
| int cpu_cores = ovms::getCoreCount(); | ||
| struct rlimit limit; | ||
| ASSERT_EQ(getrlimit(RLIMIT_NOFILE, &limit), 0); | ||
| struct rlimit newLimit = {static_cast<rlim_t>(cpu_cores * 5), limit.rlim_max}; | ||
| std::cout << "Setting open files limit to " << newLimit.rlim_cur << " to test that default rest_workers count is reduced based on open files limit" << std::endl; |
There was a problem hiding this comment.
This test comment says the open-files limit is set to 1024, but the code actually sets it to cpu_cores * 5. Please update the comment to match the behavior (or adjust the limit if 1024 is intended).
| int cpu_cores = ovms::getCoreCount(); | ||
| struct rlimit limit; | ||
| ASSERT_EQ(getrlimit(RLIMIT_NOFILE, &limit), 0); | ||
| struct rlimit newLimit = {static_cast<rlim_t>(cpu_cores * 5), limit.rlim_max}; | ||
| std::cout << "Setting open files limit to " << newLimit.rlim_cur << " to test that default rest_workers count is reduced based on open files limit" << std::endl; | ||
| ASSERT_EQ(setrlimit(RLIMIT_NOFILE, &newLimit), 0); | ||
|
|
There was a problem hiding this comment.
setrlimit(RLIMIT_NOFILE, ...) can fail if newLimit.rlim_cur exceeds limit.rlim_max (or if the process lacks permission to change limits). As written, cpu_cores * 5 might be > rlim_max on some systems/containers, making this test flaky. Consider clamping rlim_cur to limit.rlim_max and/or skipping the test with a clear message when the limit cannot be adjusted.
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
| return false; | ||
| } | ||
| #ifdef __linux__ | ||
| if (restWorkers() > (getMaxOpenFilesLimit() - RESERVED_OPEN_FILES) / 6) { |
There was a problem hiding this comment.
why is it 6 here, but 7 in getDefaultRestWorkers()?
| if (maxOpenFiles <= RESERVED_OPEN_FILES) { | ||
| return static_cast<uint64_t>(2); // minimum functional number | ||
| } | ||
| return std::min(static_cast<uint64_t>(AVAILABLE_CORES), (maxOpenFiles - RESERVED_OPEN_FILES) / 7); // 5x rest_workers to initialize ovms and 2x rest_workers for new connections |
There was a problem hiding this comment.
can you add log that number of workers has been decreased due to limit of open files?
| char* n_argv[] = {"ovms", "--config_path", "/path1", "--rest_port", "8080", "--port", "8081"}; | ||
| int arg_count = 7; | ||
| ovms::Config::instance().parse(arg_count, n_argv); | ||
| EXPECT_TRUE(ovms::Config::instance().validate()); |
There was a problem hiding this comment.
this test doesnt check if number of workers has been changed
| return false; | ||
| } | ||
| #ifdef __linux__ | ||
| if (restWorkers() > (getMaxOpenFilesLimit() - RESERVED_OPEN_FILES) / 6) { |
There was a problem hiding this comment.
is default rest workers also max possible rest workers?
🛠 Summary
Default number of REST workers based only on the number of CPUs might exceed the number of open files limit. It can happen for example in Kubernetes with Xeon CPU with almost 400 CPU cores and ulimit 1024 files.
This change make the default number of REST workers safe in such conditions. It will include open files limit as criteria to allow initialization and open connections. The ration between rest worker could assumes open files for drongon, mediapipe graphs and client connections.
When the number of REST workers is set explicitly, and required open files number exceeds the limit, a clear error will be raised and OVMS will exit without attempt to load the model.
🧪 Checklist
``